Extracting Tree Adjoining Grammars from Bracketed Corpora

نویسندگان

  • Fei Xia
  • Martha Palmer
  • Carlos Prolo
  • Anoop Sarkar
چکیده

Fei Xia Department of Computer and Information Science University of Pennsylvania 3401 Walnut Street, Suite 400A Philadelphia PA 19104, USA [email protected] Abstract In this paper, we report our work on extracting lexicalized tree adjoining grammars (LTAGs) from partially bracketed corpora. The algorithm rst fully brackets the corpora, then extracts elementary trees (etrees), and nally lters out invalid etrees using linguistic knowledge. We show that the set of extracted etrees may not be complete enough to cover the whole language, but this will not have a big impact on parsing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Extracting and Comparing Lexicalized Grammars for Different Languages

In this paper, we present a quantitative comparison between the syntactic structures of three languages: English, Chinese and Korean. This is made possible by first extracting Lexicalized Tree Adjoining Grammars from annotated corpora for each language and then performing the comparison on the extracted grammars. We found that the majority of the core grammar structures for these three language...

متن کامل

Automated Extraction of Tags from the Penn Treebank

The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to a...

متن کامل

Contextual Tree Adjoining Grammars

n rhi.\' pape1; 1i:e introduce a formalism called contextual tree adjoining grammar (CTAG). (::TAG.~ are a generalization of multi bracketed contextual reivriting gramnwrs (MBICR) which combine tree adjoini11g grammars (TAGs) and co11textual grammars. The generalization is to add a mechanism similar to obligatory adjoi11i11g in TAGs. Here, we present the definition o.f the model and some result...

متن کامل

Extraction of Tree Adjoining Grammars from a Treebank for Korean

We present the implementation of a system which extracts not only lexicalized grammars but also feature-based lexicalized grammars from Korean Sejong Treebank. We report on some practical experiments where we extract TAG grammars and tree schemata. Above all, full-scale syntactic tags and well-formed morphological analysis in Sejong Treebank allow us to extract syntactic features. In addition, ...

متن کامل

From Treebanks to Tree-Adjoining Grammars

Grammars are valuable resources for natural language processing. A large-scale grammar may incorporate a vast amount of information on morphology, syntax, and semantics. Traditionally, grammars are built manually. Hand-crafted grammars often contain rich information, but require tremendous human effort to build and maintain. As large-scale treebanks become available in the last decade, there ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009